A Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification
نویسندگان
چکیده
One of the most widely used approaches for feature extraction in speaker recognition is the filter bank-based Mel Frequency Cepstral Coefficients (MFCC) approach. The main goal of feature extraction in this context is to extract features from raw speech that captures the unique characteristics of a particular individual. During the feature extraction process, the discrete Fourier transform (DFT) is typically employed to compute the spectrum of the speech waveform. However, over the past few years, the discrete wavelet transform (DWT) has gained remarkable attention, and has been favored over the DFT in a wide variety of applications. The wavelet packet transform (WPT) is an extension of the DWT that adds more flexibility to the decomposition process. This work is a study of the impact on performance, with respect to accuracy and efficiency, when the WPT is used as a substitute for the DFT in the MFCC method. The novelty of our approach lies in its concentration on the wavelet and the decomposition level as the parameters influencing the performance. We compare the performance of the DFT with the WPT, as well as with our previous work using the DWT. It is shown that the WPT results in significantly lower order for the Gaussian Mixture Model (GMM) used to model speech, and marginal improvement in accuracy with respect to the DFT. WPT mirrors DWT in terms of the order of GMM and can perform as well as the DWT under certain conditions. © 2015 The Authors. Published by Elsevier B.V. Peer-review under responsibility of scientific committee of Missouri University of Science and Technology.
منابع مشابه
Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models
To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral...
متن کاملSpeaker Identification Using Admissible Wavelet Packet Based Decomposition
Mel Frequency Cepstral Coefficient (MFCC) features are widely used as acoustic features for speech recognition as well as speaker recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolution in low frequency region, and a low resolution in high frequency region. This kind of processing is good for obtaining stable phonetic information, but not suitable f...
متن کاملNew Filter Structure based on Admissible Wavelet Packet Transform for Text-Independent Speaker Identification
Identical acoustic features like Mel frequency cepstral Coefficients (MFCC)and Linear predictive cepstral coefficients (LPCC) are being widely used for different tasks like speech recognition and speaker recognition, whereas the requirement of speaker recognition is different than that of speech recognition. In MFCC feature representation, the Mel frequency scale is used to get a high resolutio...
متن کاملDiscrete Wavelet Transform & Linear Prediction Coding Based Method for Speech Recognition via Neural Network
In the proposed work, the techniques of wavelet transform (WT) and neural network were introduced for speech based text-independent speaker identification and Arabic vowel recognition. The linear prediction coding coefficients (LPCC) of discrete wavelet transform (DWT) upon level 3 features extraction method was developed. Feature vector fed to probabilistic neural networks (PNN) for classifica...
متن کاملA Wavelet Based Approach for Speaker Identification from Degraded Speech
This paper presents a robust speaker identification method from degraded speech signals. This method is based on the Mel-frequency cepstral coefficients (MFCCs) for feature extraction from the degraded speech signals and the wavelet transform of these signals. It is known that the MFCCs based speaker identification method is not robust enough in the presence of noise and telephone degradations....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015